-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42228][BUILD][CONNECT] Add shade and relocation rule of grpc to connect-client-jvm module #39789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @HyukjinKwon the test demo like #39788 We can use the following command to do a comparison test (shaded grpc or not in without shaded grpc with fail.
|
|
Further, I think we should move Also cc @zhenlineo |
|
also cc @dongjoon-hyun |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the JIRA is filed as a bug on Spark 3.5, this looks like a release blocker to me? Could you confirm that, @LuciferYang , @HyukjinKwon , @hvanhovell , @xinrong-meng ?
|
If Spark 3.4.0 need release the feature of Spark Connect Java client, I think it is a blocker |
|
@LuciferYang yes, 3.4 should ship with the JVM client. |
|
cc @grundprinzip too |
Think again, because of the behavior difference between maven shade and sbt assembly, even if we move e2e test out of |
@hvanhovell Got it and is there a user guide document for the JVM client? I want to know how we want end users to use the JVM client. Thanks ~ |
|
Also, cc @sunchao |
|
Hi @LuciferYang Thanks for fixing this error. Regarding the E2E test, I want to clarify the purpose of E2E: It is used by scala client to verify client methods are returning the correct results. Due to the nature of the client development, we can hardly verify a lot behaviors without a real server, thus the E2E tests contains a lot of very basic tests and need to run daily. It was not to testing the server shading or client shading. The reason to use the server shaded jar is because that's the only way to start spark with spark connect. I am not super favor the idea to use E2E also for client/server shading test. Can I suggest we have a separate test to test the client/server shading, following the idea that each test only test one thing? Really great thanks to help to look into better E2E tests. I also tried to put the E2E in a separate module, I did not get too much benefits but created an empty module that only contains tests. It would be really great if you could work out a way to run the tests as if they are unit tests. Thanks again. |
Why is this the only way? Because the conflict package name of |
@LuciferYang Some grpc dependency is not set correctly if we use the spark connect server jar without shading. |
|
@LuciferYang I checked with the maven jar, and I found the io.grpc is still not shaded? And then the guava is shaded twice. To fix the maven jar, I need the following changes on top of your PR: |
Thanks for your review @zhenlineo , let me double check this today |
This reverts commit dbbf2ba.
@zhenlineo Do you have time to fix this issue in an independent pr? This seems to not related to the current pr, and this issue is what you found. |
| </includes> | ||
| </relocation> | ||
| </relocations> | ||
| <!--SPARK-42228: Add `ServicesResourceTransformer` to relocation class names in META-INF/services for grpc--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fix:
org.apache.spark.sql.ClientE2ETestSuite *** ABORTED ***
org.sparkproject.connect.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
at org.sparkproject.connect.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:45)
at org.sparkproject.connect.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39)
at org.apache.spark.sql.connect.client.SparkConnectClient$Builder.build(SparkConnectClient.scala:191)
at org.apache.spark.sql.SparkSession$Builder.<init>(SparkSession.scala:143)
at org.apache.spark.sql.SparkSession$.builder(SparkSession.scala:134)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:163)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll$(RemoteSparkSession.scala:160)
at org.apache.spark.sql.ClientE2ETestSuite.beforeAll(ClientE2ETestSuite.scala:22)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before
ls META-INF/services
io.grpc.LoadBalancerProvider io.grpc.ManagedChannelProvider io.grpc.NameResolverProvider io.grpc.ServerProvider
cat META-INF/services/*
io.grpc.internal.PickFirstLoadBalancerProvider
io.grpc.util.SecretRoundRobinLoadBalancerProvider$Provider
io.grpc.netty.NettyChannelProvider
io.grpc.netty.UdsNettyChannelProvider
io.grpc.netty.UdsNameResolverProvider
io.grpc.netty.NettyServerProvider
After
ls META-INF/services/
org.sparkproject.connect.client.grpc.LoadBalancerProvider org.sparkproject.connect.client.grpc.NameResolverProvider
org.sparkproject.connect.client.grpc.ManagedChannelProvider org.sparkproject.connect.client.grpc.ServerProvider
cat META-INF/services/*
org.sparkproject.connect.client.grpc.internal.PickFirstLoadBalancerProvider
org.sparkproject.connect.client.grpc.util.SecretRoundRobinLoadBalancerProvider$Provider
org.sparkproject.connect.client.grpc.protobuf.services.internal.HealthCheckingRoundRobinLoadBalancerProvider
org.sparkproject.connect.client.grpc.netty.NettyChannelProvider
org.sparkproject.connect.client.grpc.netty.UdsNettyChannelProvider
org.sparkproject.connect.client.grpc.netty.UdsNameResolverProvider
org.sparkproject.connect.client.grpc.internal.DnsNameResolverProvider
org.sparkproject.connect.client.grpc.netty.NettyServerProvider
| </includes> | ||
| </artifactSet> | ||
| <relocations> | ||
| <relocation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After
ls org/sparkproject/connect/client
grpc guava
ls org/sparkproject/connect/client/grpc
Attributes$1.class LoadBalancer$Helper.class
Attributes$Builder.class LoadBalancer$PickResult.class
Attributes$Key.class LoadBalancer$PickSubchannelArgs.class
Attributes.class LoadBalancer$ResolvedAddresses$Builder.class
BinaryLog.class LoadBalancer$ResolvedAddresses.class
BindableService.class LoadBalancer$Subchannel.class
CallCredentials$MetadataApplier.class LoadBalancer$SubchannelPicker.class
CallCredentials$RequestInfo.class LoadBalancer$SubchannelStateListener.class
CallCredentials.class LoadBalancer.class
CallOptions$Key.class LoadBalancerProvider$UnknownConfig.class
....
all grpc java class relocation to org/sparkproject/connect/client/grpc
|
Could you take a look? @hvanhovell, this one is important, thanks ~ |
|
Merging to master/3.4 |
…o connect-client-jvm module
### What changes were proposed in this pull request?
When I try to do E2E test for java connnec client and connect server out of `connect-client-jvm` module, for example, just move `ClientE2ETestSuite` into a separate module and run maven test, then I found the following errors:
```
ClientE2ETestSuite:
Starting the Spark Connect Server...
Using jar: /${basedir}/spark-mine/connector/connect/server/target/spark-connect_2.12-3.5.0-SNAPSHOT.jar
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1674980902694).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.5.0-SNAPSHOT
/_/
Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 1.8.0_352)
Type in expressions to have them evaluated.
Type :help for more information.
java.lang.RuntimeException: Failed to start the test server on port 15290.
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:158)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll$(RemoteSparkSession.scala:149)
at org.apache.spark.sql.ClientE2ETestSuite.beforeAll(ClientE2ETestSuite.scala:22)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:22)
at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1178)
at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1225)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at org.scalatest.Suite.runNestedSuites(Suite.scala:1223)
at org.scalatest.Suite.runNestedSuites$(Suite.scala:1156)
at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:30)
at org.scalatest.Suite.run(Suite.scala:1111)
at org.scalatest.Suite.run$(Suite.scala:1096)
at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:30)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:47)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1321)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1315)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1315)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:992)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:970)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1481)
org.apache.spark.sql.ClientE2ETestSuite *** ABORTED ***
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:970)
java.lang.RuntimeException: Failed to start the test server on port 15290.
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:158)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll$(RemoteSparkSession.scala:149)
at org.apache.spark.sql.ClientE2ETestSuite.beforeAll(ClientE2ETestSuite.scala:22)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:22)
at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1178)
at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1225)
at org.scalatest.tools.Runner$.main(Runner.scala:775)
at org.scalatest.tools.Runner.main(Runner.scala)
Suppressed: java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;
at org.apache.spark.connect.proto.SparkConnectServiceGrpc.getExecutePlanMethod(SparkConnectServiceGrpc.java:40)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$SparkConnectServiceBlockingStub.executePlan(SparkConnectServiceGrpc.java:242)
at org.apache.spark.sql.connect.client.SparkConnectClient.execute(SparkConnectClient.scala:64)
at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:119)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:73)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:164)
... 28 more
...
Suppressed: java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;
at org.apache.spark.connect.proto.SparkConnectServiceGrpc.getExecutePlanMethod(SparkConnectServiceGrpc.java:40)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$SparkConnectServiceBlockingStub.executePlan(SparkConnectServiceGrpc.java:242)
at org.apache.spark.sql.connect.client.SparkConnectClient.execute(SparkConnectClient.scala:64)
at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:119)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:73)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:164)
... 28 more
Suppressed: java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;
at org.apache.spark.connect.proto.SparkConnectServiceGrpc.getExecutePlanMethod(SparkConnectServiceGrpc.java:40)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$SparkConnectServiceBlockingStub.executePlan(SparkConnectServiceGrpc.java:242)
at org.apache.spark.sql.connect.client.SparkConnectClient.execute(SparkConnectClient.scala:64)
at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:119)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:73)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:164)
... 28 more
Suppressed: java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;
at org.apache.spark.connect.proto.SparkConnectServiceGrpc.getExecutePlanMethod(SparkConnectServiceGrpc.java:40)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$SparkConnectServiceBlockingStub.executePlan(SparkConnectServiceGrpc.java:242)
at org.apache.spark.sql.connect.client.SparkConnectClient.execute(SparkConnectClient.scala:64)
Run completed in 1 minute, 3 seconds.
at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:119)
```
The key error message is `java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;`
The reason for the failure is that when we package `connect-client-jvm` module, we relocation the code path related to protobuf, but the relevant api in grpc has not been changed, so when we test `connect-client-jvm` shaded jar, will throw the above NoSuchMethodError exception.
So this pr add the shade and relocation rule of grpc to `connect-client-jvm` module, both maven and sbt. After this change, the above test can run successfully.
### Why are the changes needed?
Make `connect-client-jvm` shaded jar usable.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GitHub Actions and manual test as described as above
Closes #39789 from LuciferYang/SPARK-42228.
Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
(cherry picked from commit 48ab301)
Signed-off-by: Herman van Hovell <herman@databricks.com>
…o connect-client-jvm module
### What changes were proposed in this pull request?
When I try to do E2E test for java connnec client and connect server out of `connect-client-jvm` module, for example, just move `ClientE2ETestSuite` into a separate module and run maven test, then I found the following errors:
```
ClientE2ETestSuite:
Starting the Spark Connect Server...
Using jar: /${basedir}/spark-mine/connector/connect/server/target/spark-connect_2.12-3.5.0-SNAPSHOT.jar
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1674980902694).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.5.0-SNAPSHOT
/_/
Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 1.8.0_352)
Type in expressions to have them evaluated.
Type :help for more information.
java.lang.RuntimeException: Failed to start the test server on port 15290.
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:158)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll$(RemoteSparkSession.scala:149)
at org.apache.spark.sql.ClientE2ETestSuite.beforeAll(ClientE2ETestSuite.scala:22)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:22)
at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1178)
at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1225)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at org.scalatest.Suite.runNestedSuites(Suite.scala:1223)
at org.scalatest.Suite.runNestedSuites$(Suite.scala:1156)
at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:30)
at org.scalatest.Suite.run(Suite.scala:1111)
at org.scalatest.Suite.run$(Suite.scala:1096)
at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:30)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:47)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1321)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1315)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1315)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:992)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:970)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1481)
org.apache.spark.sql.ClientE2ETestSuite *** ABORTED ***
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:970)
java.lang.RuntimeException: Failed to start the test server on port 15290.
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:158)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll$(RemoteSparkSession.scala:149)
at org.apache.spark.sql.ClientE2ETestSuite.beforeAll(ClientE2ETestSuite.scala:22)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:22)
at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1178)
at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1225)
at org.scalatest.tools.Runner$.main(Runner.scala:775)
at org.scalatest.tools.Runner.main(Runner.scala)
Suppressed: java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;
at org.apache.spark.connect.proto.SparkConnectServiceGrpc.getExecutePlanMethod(SparkConnectServiceGrpc.java:40)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$SparkConnectServiceBlockingStub.executePlan(SparkConnectServiceGrpc.java:242)
at org.apache.spark.sql.connect.client.SparkConnectClient.execute(SparkConnectClient.scala:64)
at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:119)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:73)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:164)
... 28 more
...
Suppressed: java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;
at org.apache.spark.connect.proto.SparkConnectServiceGrpc.getExecutePlanMethod(SparkConnectServiceGrpc.java:40)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$SparkConnectServiceBlockingStub.executePlan(SparkConnectServiceGrpc.java:242)
at org.apache.spark.sql.connect.client.SparkConnectClient.execute(SparkConnectClient.scala:64)
at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:119)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:73)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:164)
... 28 more
Suppressed: java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;
at org.apache.spark.connect.proto.SparkConnectServiceGrpc.getExecutePlanMethod(SparkConnectServiceGrpc.java:40)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$SparkConnectServiceBlockingStub.executePlan(SparkConnectServiceGrpc.java:242)
at org.apache.spark.sql.connect.client.SparkConnectClient.execute(SparkConnectClient.scala:64)
at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:119)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:73)
at org.apache.spark.sql.connect.client.util.RemoteSparkSession.beforeAll(RemoteSparkSession.scala:164)
... 28 more
Suppressed: java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;
at org.apache.spark.connect.proto.SparkConnectServiceGrpc.getExecutePlanMethod(SparkConnectServiceGrpc.java:40)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$SparkConnectServiceBlockingStub.executePlan(SparkConnectServiceGrpc.java:242)
at org.apache.spark.sql.connect.client.SparkConnectClient.execute(SparkConnectClient.scala:64)
Run completed in 1 minute, 3 seconds.
at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:119)
```
The key error message is `java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;`
The reason for the failure is that when we package `connect-client-jvm` module, we relocation the code path related to protobuf, but the relevant api in grpc has not been changed, so when we test `connect-client-jvm` shaded jar, will throw the above NoSuchMethodError exception.
So this pr add the shade and relocation rule of grpc to `connect-client-jvm` module, both maven and sbt. After this change, the above test can run successfully.
### Why are the changes needed?
Make `connect-client-jvm` shaded jar usable.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GitHub Actions and manual test as described as above
Closes apache#39789 from LuciferYang/SPARK-42228.
Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
(cherry picked from commit 48ab301)
Signed-off-by: Herman van Hovell <herman@databricks.com>
What changes were proposed in this pull request?
When I try to do E2E test for java connnec client and connect server out of
connect-client-jvmmodule, for example, just moveClientE2ETestSuiteinto a separate module and run maven test, then I found the following errors:The key error message is
java.lang.NoSuchMethodError: io.grpc.protobuf.ProtoUtils.marshaller(Lorg/sparkproject/connect/protobuf/Message;)Lio/grpc/MethodDescriptor$Marshaller;The reason for the failure is that when we package
connect-client-jvmmodule, we relocation the code path related to protobuf, but the relevant api in grpc has not been changed, so when we testconnect-client-jvmshaded jar, will throw the above NoSuchMethodError exception.So this pr add the shade and relocation rule of grpc to
connect-client-jvmmodule, both maven and sbt. After this change, the above test can run successfully.Why are the changes needed?
Make
connect-client-jvmshaded jar usable.Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GitHub Actions and manual test as described as above